Zbiór danych pobrany został z serwisu Kaggle. Dane te pochodzą od:
Zieba, M., Tomczak, S. K., & Tomczak, J. M. (2016). Ensemble Boosted Trees with Synthetic Features Generation in Application to Bankruptcy Prediction. Expert Systems with Applications.
Bankrutujące firmy pochodzą z lat: 2000 - 2012 A działające firmy z lat 2007 - 2013
Zbiór danych składa się z 5 plików, w każdym z nich znajdują się wartości 64 wskaźników finansowych oraz zmienna binarna class informująca, czy odpowiednio dla każdego pliku po 5, 4, 3, 2 latach oraz roku dane przedsiębiorstwo ogłosiło bankructwo. Dla każdego z roku występuje od 5000 do 10 000 przedsiębiorstw.
### biblioteki
library(readr)
library(psych)
library(naniar)
library(corrplot)
library(e1071)
library(polycor)
X1year <- read_csv("../dane/1year.csv", col_types = cols(Attr1 = col_double(), Attr2 = col_double(),
Attr3 = col_double(), Attr4 = col_double(),
Attr5 = col_double(), Attr6 = col_double(),
Attr7 = col_double(), Attr8 = col_double(),
Attr9 = col_double(), Attr10 = col_double(),
Attr11 = col_double(), Attr12 = col_double(),
Attr13 = col_double(), Attr14 = col_double(),
Attr15 = col_double(), Attr16 = col_double(),
Attr17 = col_double(), Attr18 = col_double(),
Attr19 = col_double(), Attr20 = col_double(),
Attr21 = col_double(), Attr22 = col_double(),
Attr23 = col_double(), Attr24 = col_double(),
Attr25 = col_double(), Attr26 = col_double(),
Attr27 = col_double(), Attr28 = col_double(),
Attr29 = col_double(), Attr30 = col_double(),
Attr31 = col_double(), Attr32 = col_double(),
Attr33 = col_double(), Attr34 = col_double(),
Attr35 = col_double(), Attr36 = col_double(),
Attr37 = col_double(), Attr38 = col_double(),
Attr39 = col_double(), Attr40 = col_double(),
Attr41 = col_double(), Attr42 = col_double(),
Attr43 = col_double(), Attr44 = col_double(),
Attr45 = col_double(), Attr46 = col_double(),
Attr47 = col_double(), Attr48 = col_double(),
Attr49 = col_double(), Attr50 = col_double(),
Attr51 = col_double(), Attr52 = col_double(),
Attr53 = col_double(), Attr54 = col_double(),
Attr55 = col_double(), Attr56 = col_double(),
Attr57 = col_double(), Attr58 = col_double(),
Attr59 = col_double(), Attr60 = col_double(),
Attr61 = col_double(), Attr62 = col_double(),
Attr63 = col_double(), Attr64 = col_double(),
class= col_integer()))
Jak widać na poniższej wizualizacji braków danych jest stosunkowo niewiele - 1,3%, jednak występują dość licznie dla poszczególnych zmiennych i wierszy, na wydruku z konsoli zobaczyć można statystyki oraz ilośći braków danych dla poszczególnych zmiennych, usunięte zostaną zmienne, w których jest ponad 70 braków danych oraz wiersze, gdzie wystpuje ponad 5 braków danych.
## Attr1 Attr2 Attr3 Attr4
## Min. :-61.60200 Min. : 0.0000 Min. :-440.5000 Min. : 0.00226
## 1st Qu.: 0.01275 1st Qu.: 0.3195 1st Qu.: 0.0088 1st Qu.: 1.01990
## Median : 0.06347 Median : 0.5154 Median : 0.1672 Median : 1.42205
## Mean : -0.01042 Mean : 1.0034 Mean : -0.3008 Mean : 2.61909
## 3rd Qu.: 0.14446 3rd Qu.: 0.7250 3rd Qu.: 0.3593 3rd Qu.: 2.27018
## Max. : 1.62030 Max. :441.5000 Max. : 0.9962 Max. :261.50000
## NA's :3
## Attr5 Attr6 Attr7
## Min. :-2722100.0 Min. :-397.8900 Min. :-61.60200
## 1st Qu.: -51.8 1st Qu.: 0.0000 1st Qu.: 0.01775
## Median : -13.3 Median : 0.0000 Median : 0.07787
## Mean : -3043.1 Mean : -0.3866 Mean : 0.05406
## 3rd Qu.: 28.1 3rd Qu.: 0.1014 3rd Qu.: 0.17662
## Max. : 82440.0 Max. : 1.6774 Max. : 9.52930
## NA's :1
## Attr8 Attr9 Attr10 Attr11
## Min. : -2.0032 Min. : 0.00142 Min. :-440.5500 Min. :-0.58636
## 1st Qu.: 0.3508 1st Qu.: 1.03210 1st Qu.: 0.2557 1st Qu.: 0.02948
## Median : 0.8670 Median : 1.17730 Median : 0.4544 Median : 0.09463
## Mean : 2.3395 Mean : 2.06796 Mean : 0.0328 Mean : 0.13718
## 3rd Qu.: 2.0095 3rd Qu.: 2.18710 3rd Qu.: 0.6542 3rd Qu.: 0.19605
## Max. :260.5000 Max. :194.18000 Max. : 58.7250 Max. : 9.54730
## NA's :2 NA's :35
## Attr12 Attr13 Attr14 Attr15
## Min. :-5.19700 Min. :-607.4200 Min. :-61.60200 Min. :-307910
## 1st Qu.: 0.03532 1st Qu.: 0.0293 1st Qu.: 0.01775 1st Qu.: 321
## Median : 0.19505 Median : 0.0667 Median : 0.07787 Median : 933
## Mean : 0.65010 Mean : -0.6402 Mean : 0.05406 Mean : 6660
## 3rd Qu.: 0.56471 3rd Qu.: 0.1270 3rd Qu.: 0.17662 3rd Qu.: 2408
## Max. :30.65900 Max. : 4.9366 Max. : 9.52930 Max. :3599100
## NA's :3 NA's :1
## Attr16 Attr17 Attr18 Attr19
## Min. :-1.51870 Min. : 0.00226 Min. :-61.60200 Min. :-622.0600
## 1st Qu.: 0.09293 1st Qu.: 1.37820 1st Qu.: 0.01775 1st Qu.: 0.0089
## Median : 0.25305 Median : 1.93650 Median : 0.07787 Median : 0.0410
## Mean : 0.74721 Mean : 3.45308 Mean : 0.05406 Mean : -0.6932
## 3rd Qu.: 0.62628 3rd Qu.: 3.12185 3rd Qu.: 0.17662 3rd Qu.: 0.0932
## Max. :31.58700 Max. :261.50000 Max. : 9.52930 Max. : 4.6252
## NA's :2 NA's :2
## Attr20 Attr21 Attr22 Attr23
## Min. : 0.00 Min. : 0.2368 Min. :-0.49360 Min. :-634.5900
## 1st Qu.: 16.45 1st Qu.: 1.0167 1st Qu.: 0.02046 1st Qu.: 0.0063
## Median : 37.56 Median : 1.1324 Median : 0.08111 Median : 0.0331
## Mean : 79.53 Mean : 2.7854 Mean : 0.12261 Mean : -0.7168
## 3rd Qu.: 62.58 3rd Qu.: 1.2789 3rd Qu.: 0.18003 3rd Qu.: 0.0773
## Max. :25271.00 Max. :1088.3000 Max. : 6.61680 Max. : 4.6252
## NA's :274
## Attr24 Attr25 Attr26 Attr27
## Min. :-61.60200 Min. :-459.5600 Min. :-1.51870 Min. :-14790.0
## 1st Qu.: 0.02725 1st Qu.: 0.1681 1st Qu.: 0.08299 1st Qu.: 0.2
## Median : 0.13128 Median : 0.3522 Median : 0.22173 Median : 1.4
## Mean : 0.11383 Mean : -0.0943 Mean : 0.67498 Mean : 1502.1
## 3rd Qu.: 0.29669 3rd Qu.: 0.5716 3rd Qu.: 0.55951 3rd Qu.: 7.5
## Max. : 2.53290 Max. : 52.3290 Max. :29.49900 Max. :963640.0
## NA's :13 NA's :2 NA's :128
## Attr28 Attr29 Attr30 Attr31
## Min. :-83.3030 Min. :0.9764 Min. : -5.209 Min. :-622.0600
## 1st Qu.: 0.0186 1st Qu.:3.6936 1st Qu.: 0.107 1st Qu.: 0.0136
## Median : 0.4231 Median :4.0940 Median : 0.218 Median : 0.0456
## Mean : 2.7275 Mean :4.1442 Mean : 10.515 Mean : -0.6833
## 3rd Qu.: 1.2447 3rd Qu.:4.5536 3rd Qu.: 0.390 3rd Qu.: 0.0999
## Max. :884.8500 Max. :6.4404 Max. :9238.900 Max. : 4.6252
## NA's :7
## Attr32 Attr33 Attr34 Attr35
## Min. : 0.0 Min. : 0.000 Min. : -1.4559 Min. :-0.55967
## 1st Qu.: 50.9 1st Qu.: 2.974 1st Qu.: 0.2028 1st Qu.: 0.01830
## Median : 80.8 Median : 4.567 Median : 1.7861 Median : 0.07792
## Mean : 562.0 Mean : 7.461 Mean : 4.2424 Mean : 0.12217
## 3rd Qu.: 125.0 3rd Qu.: 7.170 3rd Qu.: 4.1040 3rd Qu.: 0.18398
## Max. :351630.0 Max. :537.950 Max. :537.9500 Max. : 7.13970
## NA's :2 NA's :3 NA's :2
## Attr36 Attr37 Attr38 Attr39
## Min. : 0.00394 Min. : -41.728 Min. :-440.5500 Min. :-701.6300
## 1st Qu.: 1.29820 1st Qu.: 1.380 1st Qu.: 0.3670 1st Qu.: 0.0098
## Median : 1.86510 Median : 3.451 Median : 0.5535 Median : 0.0415
## Mean : 2.45535 Mean : 51.639 Mean : 0.1191 Mean : -0.7118
## 3rd Qu.: 2.67450 3rd Qu.: 9.976 3rd Qu.: 0.7180 3rd Qu.: 0.0913
## Max. :194.18000 Max. :4770.000 Max. : 60.1140 Max. : 4.9681
## NA's :379
## Attr40 Attr41 Attr42 Attr43
## Min. : -0.16910 Min. :-10.9690 Min. :-701.6300 Min. : 4.4
## 1st Qu.: 0.04444 1st Qu.: 0.0343 1st Qu.: 0.0112 1st Qu.: 65.3
## Median : 0.13541 Median : 0.0907 Median : 0.0431 Median : 96.5
## Mean : 0.82839 Mean : 0.8155 Mean : -0.7110 Mean : 1112.4
## 3rd Qu.: 0.42594 3rd Qu.: 0.2167 3rd Qu.: 0.0926 3rd Qu.: 131.8
## Max. :176.45000 Max. :328.6900 Max. : 4.9681 Max. :919500.0
## NA's :3 NA's :6
## Attr44 Attr45 Attr46 Attr47
## Min. : 0.0 Min. :-711.430 Min. : 0.00059 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 0.065 1st Qu.: 0.52482 1st Qu.: 17.34
## Median : 51.9 Median : 0.321 Median : 0.85587 Median : 40.22
## Mean : 1032.8 Mean : 5.280 Mean : 1.79316 Mean : 57.41
## 3rd Qu.: 76.9 3rd Qu.: 0.938 3rd Qu.: 1.52440 3rd Qu.: 68.79
## Max. :894230.0 Max. :3730.000 Max. :176.45000 Max. :4510.20
## NA's :25 NA's :4 NA's :1
## Attr48 Attr49 Attr50 Attr51
## Min. :-3.71400 Min. :-716.2600 Min. : 0.00226 Min. : 0.0000
## 1st Qu.:-0.02022 1st Qu.: -0.0124 1st Qu.: 0.77364 1st Qu.: 0.2416
## Median : 0.04053 Median : 0.0221 Median : 1.13460 Median : 0.3956
## Mean : 0.06446 Mean : -0.7641 Mean : 2.15812 Mean : 0.8968
## 3rd Qu.: 0.13772 3rd Qu.: 0.0692 3rd Qu.: 1.80530 3rd Qu.: 0.5840
## Max. : 6.29060 Max. : 4.6567 Max. :261.50000 Max. :441.5000
## NA's :2
## Attr52 Attr53 Attr54 Attr55
## Min. : 0.0000 Min. : -98.122 Min. : -82.303 Min. :-189030.0
## 1st Qu.: 0.1391 1st Qu.: 0.666 1st Qu.: 0.922 1st Qu.: 5.7
## Median : 0.2167 Median : 1.131 Median : 1.328 Median : 1280.9
## Mean : 0.9877 Mean : 7.532 Mean : 7.950 Mean : 5478.4
## 3rd Qu.: 0.3361 3rd Qu.: 1.951 3rd Qu.: 2.113 3rd Qu.: 4914.4
## Max. :453.9600 Max. :4247.000 Max. :4347.500 Max. : 537580.0
## NA's :1 NA's :7 NA's :7
## Attr56 Attr57 Attr58
## Min. :-701.6300 Min. :-315.37000 Min. : 0.0000
## 1st Qu.: 0.0169 1st Qu.: 0.03944 1st Qu.: 0.8737
## Median : 0.0553 Median : 0.16610 Median : 0.9451
## Mean : -0.6815 Mean : -0.12735 Mean : 1.6976
## 3rd Qu.: 0.1299 3rd Qu.: 0.34360 3rd Qu.: 0.9835
## Max. : 1.0000 Max. : 7.26740 Max. :702.6300
##
## Attr59 Attr60 Attr61 Attr62
## Min. :-42.71700 Min. : 0.014 Min. : 0.0004 Min. : 0
## 1st Qu.: 0.00000 1st Qu.: 5.732 1st Qu.: 4.7431 1st Qu.: 46
## Median : 0.02488 Median : 9.462 Median : 7.0370 Median : 72
## Mean : 0.22589 Mean : 60.582 Mean : 11.9421 Mean : 7926
## 3rd Qu.: 0.25205 3rd Qu.: 20.489 3rd Qu.: 11.3643 3rd Qu.: 116
## Max. : 31.47200 Max. :19157.000 Max. :749.0000 Max. :7276000
## NA's :25 NA's :1
## Attr63 Attr64 class
## Min. : 0.000 Min. : 0.000 Min. :0.0000
## 1st Qu.: 3.145 1st Qu.: 2.688 1st Qu.:0.0000
## Median : 5.068 Median : 4.778 Median :0.0000
## Mean : 8.256 Mean : 36.398 Mean :0.2675
## 3rd Qu.: 7.880 3rd Qu.: 10.119 3rd Qu.:1.0000
## Max. :545.950 Max. :14043.000 Max. :1.0000
## NA's :3 NA's :7
#funkcja do usuwania, gdy NA w wierszu
usunNAWiersz <- function(data)
{
data$sumaNA <- 0
for(i in 1:nrow(data))
{
data$sumaNA[i] <- sum(is.na(data[i,]))
}
data <- data[which(data$sumaNA<5),]
return(data[,-(ncol(data))])
}
#funkcja do usuwania, gdy NA w kolumnie
usunNAKol <- function(data)
{
data[nrow(data)+1,] <- 0
for(i in 1:ncol(data))
{
data[nrow(data),i] <- sum(is.na(data[,i]))
}
data <- data[,which(data[nrow(data),]<70)]
return(data[-nrow(data),])
}
X1year <- usunNAWiersz(X1year)
X1year <- usunNAKol(X1year)
Po usunięciu kolumn i wierszy z dużą liczbą braków danych, braków danych jest około 0,1% Obserwacji jest 6929,a zmiennych 58.
Braki te mogłyby być jeszcze uzupełnione średnią z grupy, do której należy. Patrząc na statystyki wśród zmiennych, zauważyć można, że dla niektórych atrybutów występują obserwacje znacznie odstające.
vis_miss(X1year)
summary(X1year)
## Attr1 Attr2 Attr3 Attr4
## Min. :-1.37270 Min. :0.01149 Min. :-2.364000 Min. : 0.1537
## 1st Qu.: 0.01336 1st Qu.:0.32045 1st Qu.: 0.008579 1st Qu.: 1.0184
## Median : 0.06429 Median :0.51667 Median : 0.165170 Median : 1.4210
## Mean : 0.09045 Mean :0.54478 Mean : 0.155267 Mean : 2.3429
## 3rd Qu.: 0.14553 3rd Qu.:0.72481 3rd Qu.: 0.355935 3rd Qu.: 2.2654
## Max. : 1.62030 Max. :3.33570 Max. : 0.934540 Max. :88.9700
##
## Attr5 Attr6 Attr7 Attr8
## Min. :-102660.00 Min. :-3.48480 Min. :-1.37270 Min. :-2.0032
## 1st Qu.: -51.85 1st Qu.: 0.00000 1st Qu.: 0.01837 1st Qu.: 0.3530
## Median : -13.66 Median : 0.00000 Median : 0.07809 Median : 0.8665
## Mean : -34.96 Mean : 0.02897 Mean : 0.10851 Mean : 2.0574
## 3rd Qu.: 27.72 3rd Qu.: 0.10174 3rd Qu.: 0.17717 3rd Qu.: 2.0061
## Max. : 82440.00 Max. : 1.67740 Max. : 1.62030 Max. :86.0380
##
## Attr9 Attr10 Attr11 Attr12
## Min. : 0.00142 Min. :-2.3357 Min. :-0.58636 Min. :-5.19700
## 1st Qu.: 1.03208 1st Qu.: 0.2558 1st Qu.: 0.03000 1st Qu.: 0.03682
## Median : 1.17705 Median : 0.4539 Median : 0.09463 Median : 0.19659
## Mean : 1.87343 Mean : 0.4297 Mean : 0.12645 Mean : 0.63495
## 3rd Qu.: 2.18750 3rd Qu.: 0.6533 3rd Qu.: 0.19420 3rd Qu.: 0.55784
## Max. :48.00500 Max. : 0.9884 Max. : 1.62820 Max. :30.65900
## NA's :34
## Attr13 Attr14 Attr15 Attr16
## Min. :-607.4200 Min. :-1.37270 Min. :-307910.0 Min. :-1.51870
## 1st Qu.: 0.0300 1st Qu.: 0.01837 1st Qu.: 324.9 1st Qu.: 0.09353
## Median : 0.0677 Median : 0.07809 Median : 932.9 Median : 0.25348
## Mean : -0.5996 Mean : 0.10851 Mean : 2869.2 Mean : 0.73412
## 3rd Qu.: 0.1283 3rd Qu.: 0.17717 3rd Qu.: 2384.4 3rd Qu.: 0.62759
## Max. : 4.9366 Max. : 1.62030 Max. : 681770.0 Max. :31.58700
##
## Attr17 Attr18 Attr19 Attr20
## Min. : 0.2998 Min. :-1.37270 Min. :-622.0600 Min. : 0.00
## 1st Qu.: 1.3797 1st Qu.: 0.01837 1st Qu.: 0.0092 1st Qu.: 16.94
## Median : 1.9354 Median : 0.07809 Median : 0.0411 Median : 38.24
## Mean : 3.1721 Mean : 0.10851 Mean : -0.6534 Mean : 80.44
## 3rd Qu.: 3.1206 3rd Qu.: 0.17717 3rd Qu.: 0.0937 3rd Qu.: 63.07
## Max. :87.0460 Max. : 1.62030 Max. : 4.6252 Max. :25271.00
##
## Attr22 Attr23 Attr24 Attr25
## Min. :-0.49360 Min. :-634.5900 Min. :-1.80250 Min. :-3.2055
## 1st Qu.: 0.02035 1st Qu.: 0.0065 1st Qu.: 0.02839 1st Qu.: 0.1707
## Median : 0.08100 Median : 0.0333 Median : 0.13164 Median : 0.3522
## Mean : 0.11529 Mean : -0.6769 Mean : 0.17976 Mean : 0.3300
## 3rd Qu.: 0.17851 3rd Qu.: 0.0780 3rd Qu.: 0.29776 3rd Qu.: 0.5713
## Max. : 1.62820 Max. : 4.6252 Max. : 2.53290 Max. : 0.9773
## NA's :11
## Attr26 Attr28 Attr29 Attr30
## Min. :-1.51870 Min. :-83.3030 Min. :1.964 Min. : -5.209
## 1st Qu.: 0.08446 1st Qu.: 0.0181 1st Qu.:3.701 1st Qu.: 0.109
## Median : 0.22251 Median : 0.4200 Median :4.101 Median : 0.220
## Mean : 0.66320 Mean : 2.7204 Mean :4.161 Mean : 10.153
## 3rd Qu.: 0.56202 3rd Qu.: 1.2401 3rd Qu.:4.558 3rd Qu.: 0.392
## Max. :29.49900 Max. :884.8500 Max. :6.440 Max. :9238.900
## NA's :1
## Attr31 Attr32 Attr33 Attr34
## Min. :-622.0600 Min. : 1.355 Min. : 0.03525 Min. : -1.4559
## 1st Qu.: 0.0138 1st Qu.: 51.028 1st Qu.: 2.98022 1st Qu.: 0.2204
## Median : 0.0458 Median : 80.800 Median : 4.56740 Median : 1.8079
## Mean : -0.6433 Mean : 139.720 Mean : 6.93500 Mean : 3.6989
## 3rd Qu.: 0.1010 3rd Qu.: 125.000 3rd Qu.: 7.17873 3rd Qu.: 4.0953
## Max. : 4.6252 Max. :10355.000 Max. :271.02000 Max. :271.0200
##
## Attr35 Attr36 Attr38 Attr39
## Min. :-0.55967 Min. : 0.00394 Min. :-2.3357 Min. :-701.6300
## 1st Qu.: 0.01828 1st Qu.: 1.30000 1st Qu.: 0.3670 1st Qu.: 0.0098
## Median : 0.07792 Median : 1.86465 Median : 0.5518 Median : 0.0417
## Mean : 0.11375 Mean : 2.25710 Mean : 0.5149 Mean : -0.7236
## 3rd Qu.: 0.18313 3rd Qu.: 2.66945 3rd Qu.: 0.7165 3rd Qu.: 0.0913
## Max. : 2.15460 Max. :48.12100 Max. : 0.9906 Max. : 4.9681
##
## Attr40 Attr41 Attr42 Attr43
## Min. :-0.16910 Min. :-10.96900 Min. :-701.6300 Min. : 5.7
## 1st Qu.: 0.04439 1st Qu.: 0.03489 1st Qu.: 0.0111 1st Qu.: 65.6
## Median : 0.13426 Median : 0.09072 Median : 0.0432 Median : 96.5
## Mean : 0.63352 Mean : 0.42651 Mean : -0.7224 Mean : 1109.2
## 3rd Qu.: 0.40955 3rd Qu.: 0.21587 3rd Qu.: 0.0929 3rd Qu.: 131.5
## Max. :80.85800 Max. : 79.31700 Max. : 4.9681 Max. :919500.0
## NA's :5
## Attr44 Attr45 Attr46 Attr47
## Min. : 0.5 Min. :-291.450 Min. : 0.03139 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 0.066 1st Qu.: 0.52405 1st Qu.: 17.92
## Median : 51.5 Median : 0.322 Median : 0.84802 Median : 41.17
## Mean : 1028.8 Mean : 6.003 Mean : 1.60368 Mean : 58.09
## 3rd Qu.: 76.6 3rd Qu.: 0.938 3rd Qu.: 1.52045 3rd Qu.: 69.67
## Max. :894230.0 Max. :3730.000 Max. :88.97000 Max. :4510.20
## NA's :17 NA's :1
## Attr48 Attr49 Attr50 Attr51
## Min. :-3.71400 Min. :-716.2600 Min. : 0.06638 Min. :0.005791
## 1st Qu.:-0.02082 1st Qu.: -0.0126 1st Qu.: 0.77026 1st Qu.:0.242982
## Median : 0.03976 Median : 0.0217 Median : 1.12835 Median :0.396130
## Mean : 0.05672 Mean : -0.7762 Mean : 1.88121 Mean :0.438056
## 3rd Qu.: 0.13702 3rd Qu.: 0.0690 3rd Qu.: 1.78880 3rd Qu.:0.584252
## Max. : 1.56490 Max. : 4.6567 Max. :57.66100 Max. :3.335700
##
## Attr52 Attr53 Attr54 Attr55
## Min. : 0.003713 Min. :-98.1220 Min. :-82.3030 Min. :-189030.0
## 1st Qu.: 0.139298 1st Qu.: 0.6643 1st Qu.: 0.9203 1st Qu.: 4.4
## Median : 0.218945 Median : 1.1256 Median : 1.3219 Median : 1297.0
## Mean : 0.364982 Mean : 3.0711 Mean : 3.3683 Mean : 5496.6
## 3rd Qu.: 0.336088 3rd Qu.: 1.9454 3rd Qu.: 2.0981 3rd Qu.: 4955.7
## Max. :28.371000 Max. :838.4500 Max. :838.4500 Max. : 537580.0
## NA's :1 NA's :1
## Attr56 Attr57 Attr58
## Min. :-701.6300 Min. :-315.37000 Min. : 0.0042
## 1st Qu.: 0.0168 1st Qu.: 0.04114 1st Qu.: 0.8746
## Median : 0.0553 Median : 0.17105 Median : 0.9456
## Mean : -0.6950 Mean : -0.13023 Mean : 1.7109
## 3rd Qu.: 0.1299 3rd Qu.: 0.34414 3rd Qu.: 0.9837
## Max. : 0.9979 Max. : 7.26740 Max. :702.6300
##
## Attr59 Attr60 Attr61 Attr62
## Min. :-42.71700 Min. : 0.014 Min. : 0.0004 Min. : 1
## 1st Qu.: 0.00000 1st Qu.: 5.732 1st Qu.: 4.7656 1st Qu.: 46
## Median : 0.02683 Median : 9.435 Median : 7.0819 Median : 72
## Mean : 0.21861 Mean : 59.541 Mean : 11.9589 Mean : 7848
## 3rd Qu.: 0.25268 3rd Qu.: 20.351 3rd Qu.: 11.4095 3rd Qu.: 116
## Max. : 31.47200 Max. :19157.000 Max. :749.0000 Max. :7276000
## NA's :17
## Attr63 Attr64 class
## Min. : 0.00005 Min. : 0.000 Min. :0.0000
## 1st Qu.: 3.14780 1st Qu.: 2.688 1st Qu.:0.0000
## Median : 5.05845 Median : 4.794 Median :0.0000
## Mean : 7.71412 Mean : 21.683 Mean :0.2627
## 3rd Qu.: 7.88507 3rd Qu.: 9.971 3rd Qu.:1.0000
## Max. :277.72000 Max. :3756.100 Max. :1.0000
## NA's :1
Ponieważ dane są w różnych rzędach wielkości, postanowiono je zeskalować.
W celu detekcji i usunięcia outlierów zastosowano procedurę opartą na kwantylach i współczynniku asymetrii, usunięcie outlierów taką metodą sprawia, że zwiększa się dokładność i wrażliwość prognozy (Problem of Outliers in Corporate BankruptcyPrediction, Barbara Pawełek, Józef Pociecha) .
outlieryKwantyl <- function(x)
{
x <- as.data.frame(x)
for(i in 1:(ncol(x)-1))
{
coefAsy <- moment(as.double(x[,i]), order=3, center=TRUE, na.rm = TRUE)/((sd(x[,i], na.rm = TRUE))^3)
if(is.numeric(coefAsy))
{
if(coefAsy>1)
{
outliersIndex <- which((x[,i])>quantile(x[,i], .99, na.rm = TRUE))
if(length(outliersIndex)>0)
{
x <- x[-outliersIndex,]
}
}
if(coefAsy<(-1))
{
outliersIndex <- which((x[,i])<quantile(x[,i], .01, na.rm = TRUE))
if(length(outliersIndex)>0)
{
x <- x[-outliersIndex,]
}
}
if(coefAsy>-1&&coefAsy<1)
{
outliersIndex <- which((x[,i])>quantile(x[,i], .995, na.rm = TRUE))&&which((x[,i])<quantile(x[,i], .005, na.rm = TRUE))
if(length(outliersIndex)>0)
{
x <- x[-outliersIndex,]
}
}
}
}
x <- as.data.frame(x)
return(x)
}
outliery <- function(x)
{
for(i in 1:ncol(x)-1)
{
outliersIndex <- which(abs(x[,i])-3>0)
if(sum(outliersIndex)!=0)
{
x <- x[-outliersIndex,]
}
}
x <- as.data.frame(x)
return(x)
}
## Attr1 Attr2 Attr3 Attr4
## Min. :-1.94132 Min. :-1.41803 Min. :-2.61159 Min. :-0.44453
## 1st Qu.:-0.39234 1st Qu.:-0.53871 1st Qu.:-0.43773 1st Qu.:-0.27889
## Median :-0.13871 Median :-0.05923 Median :-0.03635 Median :-0.21033
## Mean :-0.04672 Mean :-0.01491 Mean :-0.02854 Mean :-0.16080
## 3rd Qu.: 0.21103 3rd Qu.: 0.50494 3rd Qu.: 0.40781 3rd Qu.:-0.08746
## Max. : 2.18058 Max. : 2.56399 Max. : 1.56431 Max. : 1.38973
##
## Attr5 Attr6 Attr7 Attr8
## Min. :-0.384328 Min. :-2.02664 Min. :-1.90055 Min. :-0.48083
## 1st Qu.:-0.002381 1st Qu.:-0.09434 1st Qu.:-0.42203 1st Qu.:-0.32675
## Median : 0.003556 Median :-0.09434 Median :-0.15106 Median :-0.23927
## Mean : 0.008545 Mean : 0.10333 Mean :-0.04818 Mean :-0.16284
## 3rd Qu.: 0.011028 3rd Qu.: 0.27028 3rd Qu.: 0.22297 3rd Qu.:-0.08775
## Max. : 1.706262 Max. : 2.31764 Max. : 2.68469 Max. : 1.35845
##
## Attr9 Attr10 Attr11 Attr12
## Min. :-0.6358 Min. :-2.84065 Min. :-2.10545 Min. :-1.22665
## 1st Qu.:-0.3993 1st Qu.:-0.47208 1st Qu.:-0.48155 1st Qu.:-0.27620
## Median :-0.3373 Median : 0.03309 Median :-0.15539 Median :-0.20889
## Mean :-0.0981 Mean : 0.02651 Mean :-0.07396 Mean :-0.15492
## 3rd Qu.: 0.1086 3rd Qu.: 0.53540 3rd Qu.: 0.22156 3rd Qu.:-0.08784
## Max. : 1.9470 Max. : 1.45895 Max. : 2.74400 Max. : 0.80904
## NA's :22
## Attr13 Attr14 Attr15 Attr16
## Min. :0.01745 Min. :-1.90055 Min. :-1.304566 Min. :-0.9343
## 1st Qu.:0.03218 1st Qu.:-0.42203 1st Qu.:-0.072174 1st Qu.:-0.2726
## Median :0.03371 Median :-0.15106 Median :-0.051375 Median :-0.2145
## Mean :0.03424 Mean :-0.04818 Mean :-0.034555 Mean :-0.1655
## 3rd Qu.:0.03587 3rd Qu.: 0.22297 3rd Qu.:-0.005883 3rd Qu.:-0.1079
## Max. :0.04619 Max. : 2.68469 Max. : 1.079261 Max. : 0.8336
##
## Attr17 Attr18 Attr19 Attr20
## Min. :-0.46458 Min. :-1.90055 Min. :0.01918 Min. :-0.09541
## 1st Qu.:-0.33446 1st Qu.:-0.42203 1st Qu.:0.03280 1st Qu.:-0.07124
## Median :-0.24313 Median :-0.15106 Median :0.03426 Median :-0.04772
## Mean :-0.16860 Mean :-0.04818 Mean :0.03467 Mean :-0.04282
## 3rd Qu.:-0.09802 3rd Qu.: 0.22297 3rd Qu.:0.03628 3rd Qu.:-0.02406
## Max. : 1.29463 Max. : 2.68469 Max. :0.04537 Max. : 0.10568
##
## Attr22 Attr23 Attr24 Attr25
## Min. :-1.99071 Min. :0.01993 Min. :-1.86892 Min. :-2.14890
## 1st Qu.:-0.45131 1st Qu.:0.03315 1st Qu.:-0.46759 1st Qu.:-0.33501
## Median :-0.14285 Median :0.03435 Median :-0.17693 Median : 0.07291
## Mean :-0.03658 Mean :0.03470 Mean :-0.05551 Mean : 0.07807
## 3rd Qu.: 0.22357 3rd Qu.:0.03614 3rd Qu.: 0.29246 3rd Qu.: 0.57717
## Max. : 2.87650 Max. :0.04433 Max. : 1.94647 Max. : 1.50168
## NA's :5
## Attr26 Attr28 Attr29 Attr30
## Min. :-0.9757 Min. :-0.21860 Min. :-3.01679 Min. :-0.03411
## 1st Qu.:-0.2675 1st Qu.:-0.08779 1st Qu.:-0.58776 1st Qu.:-0.03327
## Median :-0.2141 Median :-0.07782 Median :-0.03647 Median :-0.03298
## Mean :-0.1649 Mean :-0.06479 Mean : 0.06799 Mean :-0.03287
## 3rd Qu.:-0.1076 3rd Qu.:-0.05849 3rd Qu.: 0.65176 3rd Qu.:-0.03259
## Max. : 0.6298 Max. : 0.17546 Max. : 2.94338 Max. :-0.02967
##
## Attr31 Attr32 Attr33 Attr34
## Min. :0.005559 Min. :-0.23000 Min. :-0.44873 Min. :-0.387637
## 1st Qu.:0.032620 1st Qu.:-0.16454 1st Qu.:-0.29537 1st Qu.:-0.320889
## Median :0.033970 Median :-0.11727 Median :-0.18452 Median :-0.181973
## Mean :0.034406 Mean :-0.09803 Mean :-0.14900 Mean :-0.131548
## 3rd Qu.:0.036163 3rd Qu.:-0.05156 3rd Qu.:-0.04043 3rd Qu.: 0.006137
## Max. :0.047032 Max. : 0.26482 Max. : 0.66804 Max. : 0.712239
##
## Attr35 Attr36 Attr38 Attr39
## Min. :-3.40453 Min. :-0.8861 Min. :-3.27540 Min. :0.01248
## 1st Qu.:-0.42202 1st Qu.:-0.3693 1st Qu.:-0.40113 1st Qu.:0.03242
## Median :-0.15510 Median :-0.1569 Median : 0.09265 Median :0.03363
## Mean :-0.04483 Mean :-0.0779 Mean : 0.03360 Mean :0.03394
## 3rd Qu.: 0.23921 3rd Qu.: 0.1532 3rd Qu.: 0.57358 3rd Qu.:0.03537
## Max. : 2.67623 Max. : 1.7045 Max. : 1.46414 Max. :0.04348
##
## Attr40 Attr41 Attr42 Attr43
## Min. :-0.2509 Min. :-1.81419 Min. :0.02186 Min. :-0.03651
## 1st Qu.:-0.1846 1st Qu.:-0.08859 1st Qu.:0.03245 1st Qu.:-0.03475
## Median :-0.1655 Median :-0.07458 Median :0.03360 Median :-0.03386
## Mean :-0.1206 Mean :-0.07707 Mean :0.03402 Mean :-0.03374
## 3rd Qu.:-0.1046 3rd Qu.:-0.05072 3rd Qu.:0.03537 3rd Qu.:-0.03281
## Max. : 0.5751 Max. : 0.73197 Max. :0.04332 Max. :-0.02824
## NA's :2
## Attr44 Attr45 Attr46 Attr47
## Min. :-0.03514 Min. :-0.09120 Min. :-0.36102 Min. :-0.36572
## 1st Qu.:-0.03421 1st Qu.:-0.04770 1st Qu.:-0.25383 1st Qu.:-0.23035
## Median :-0.03362 Median :-0.04605 Median :-0.19546 Median :-0.09540
## Mean :-0.03346 Mean :-0.04330 Mean :-0.15135 Mean :-0.06752
## 3rd Qu.:-0.03287 3rd Qu.:-0.04235 3rd Qu.:-0.08758 3rd Qu.: 0.03786
## Max. :-0.03029 Max. : 0.04878 Max. : 0.42030 Max. : 0.69534
## NA's :5
## Attr48 Attr49 Attr50 Attr51
## Min. :-1.47020 Min. :0.02031 Min. :-0.46417 Min. :-1.38483
## 1st Qu.:-0.27235 1st Qu.:0.03297 1st Qu.:-0.29945 1st Qu.:-0.55146
## Median :-0.04591 Median :0.03427 Median :-0.21822 Median :-0.12202
## Mean : 0.02096 Mean :0.03439 Mean :-0.16681 Mean :-0.01548
## 3rd Qu.: 0.25641 3rd Qu.:0.03590 3rd Qu.:-0.09493 3rd Qu.: 0.42766
## Max. : 2.11979 Max. :0.04366 Max. : 0.71053 Max. : 3.15102
##
## Attr52 Attr53 Attr54 Attr55
## Min. :-0.22889 Min. :-0.22861 Min. :-0.23875 Min. :-2.01059
## 1st Qu.:-0.16013 1st Qu.:-0.08222 1st Qu.:-0.08408 1st Qu.:-0.15531
## Median :-0.11411 Median :-0.06983 Median :-0.07355 Median :-0.11750
## Mean :-0.09143 Mean :-0.05781 Mean :-0.06097 Mean :-0.02872
## 3rd Qu.:-0.04330 3rd Qu.:-0.05074 3rd Qu.:-0.05528 3rd Qu.:-0.00717
## Max. : 0.28960 Max. : 0.20826 Max. : 0.19784 Max. : 3.33942
##
## Attr56 Attr57 Attr58 Attr59
## Min. :0.01123 Min. :-0.08802 Min. :-0.04748 Min. :-0.70102
## 1st Qu.:0.03125 1st Qu.: 0.01737 1st Qu.:-0.03545 1st Qu.:-0.10159
## Median :0.03269 Median : 0.02926 Median :-0.03333 Median :-0.06659
## Mean :0.03332 Mean : 0.03240 Mean :-0.03400 Mean : 0.06408
## 3rd Qu.:0.03504 3rd Qu.: 0.04346 3rd Qu.:-0.03196 3rd Qu.: 0.05526
## Max. :0.04748 Max. : 0.29174 Max. :-0.02045 Max. : 6.65298
##
## Attr60 Attr61 Attr62 Attr63
## Min. :-0.08840 Min. :-0.30911 Min. :-0.03305 Min. :-0.44710
## 1st Qu.:-0.08240 1st Qu.:-0.22433 1st Qu.:-0.03292 1st Qu.:-0.31179
## Median :-0.07782 Median :-0.15291 Median :-0.03283 Median :-0.20147
## Mean :-0.06745 Mean :-0.09070 Mean :-0.03279 Mean :-0.16570
## 3rd Qu.:-0.06470 3rd Qu.:-0.02879 3rd Qu.:-0.03269 3rd Qu.:-0.05955
## Max. : 0.15715 Max. : 1.64506 Max. :-0.03217 Max. : 0.61869
## NA's :5
## Attr64 class
## Min. :-0.15650 Min. :0.0000
## 1st Qu.:-0.13821 1st Qu.:0.0000
## Median :-0.12364 Median :0.0000
## Mean :-0.09880 Mean :0.2619
## 3rd Qu.:-0.09569 3rd Qu.:1.0000
## Max. : 0.50834 Max. :1.0000
##
Praktycznie każda zmienna cechuje się dużą zmiennością, wyjątkiem jest Attr13, Attr20, Attr30, Attr43, Attr44, Attr49, Attr56, Attr59, Attr62.
## Attr1 Attr2 Attr3 Attr4 Attr5 Attr6 Attr7 Attr8 Attr9 Attr10
## -11.951 -46.548 -23.287 -1.133 12.549 5.038 -12.479 -1.530 -4.542 25.352
## Attr11 Attr12 Attr13 Attr14 Attr15 Attr16 Attr17 Attr18 Attr19 Attr20
## -8.694 -1.270 0.100 -12.479 -4.643 -1.042 -1.482 -12.479 0.089 -0.821
## Attr22 Attr23 Attr24 Attr25 Attr26 Attr28 Attr29 Attr30 Attr31 Attr32
## -17.658 0.078 -10.947 8.330 -0.989 -0.703 13.387 -0.019 0.100 -0.884
## Attr33 Attr34 Attr35 Attr36 Attr38 Attr39 Attr40 Attr41 Attr42 Attr43
## -1.318 -1.664 -14.236 -5.606 20.454 0.085 -0.925 -2.235 0.076 -0.040
## Attr44 Attr45 Attr46 Attr47 Attr48 Attr49 Attr50 Attr51 Attr52 Attr53
## -0.029 -0.243 -0.969 -3.011 23.488 0.082 -1.205 -46.059 -0.988 -0.807
## Attr54 Attr55 Attr56 Attr57 Attr58 Attr59 Attr60 Attr61 Attr62 Attr63
## -0.755 -14.316 0.108 0.914 -0.098 6.425 -0.434 -2.469 -0.005 -1.171
## Attr64 class
## -0.785 1.680
Ponieważ wiele funkcji jest czułych na braki danych, postanowiono zastąpić je średnią grupową. Zmienną class ustawiam jako factor.
W macierzy korelacji nie widać silnej korelacji między żadną ze zmiennych, a bankructwem. Zmienne posegregowane w kolejności malejącej korelacji.
Widać jednak silną korelację między zmiennymi, właściwe mogłoby być zastosowanie głównych składowych w celu redukcji wymiarowości.
Ponieważ praca z tak dużą ilością zmiennych może być trudna, przeszkadza to w wizualizacji i może zmniejszyć efektywność poprzez włączenie zmiennych, które nie mają wpływu na analizę, postanowiono przeprowadzić analizę głównych składowych
| SklGlowna | wartWlasna | procWarianWyj | skumpProcWarWyj |
|---|---|---|---|
| 1 | 21.25 | 34.84 | 34.84 |
| 2 | 8.46 | 13.86 | 48.70 |
| 3 | 6.72 | 11.02 | 59.72 |
| 4 | 4.47 | 7.33 | 67.05 |
| 5 | 2.32 | 3.80 | 70.85 |
| 6 | 1.79 | 2.93 | 73.78 |
| 7 | 1.75 | 2.86 | 76.64 |
| 8 | 1.32 | 2.16 | 78.80 |
| 9 | 1.17 | 1.92 | 80.72 |
| 10 | 1.11 | 1.82 | 82.54 |
| 11 | 1.02 | 1.67 | 84.21 |
| 12 | 0.95 | 1.55 | 85.76 |
| 13 | 0.81 | 1.34 | 87.09 |
| 14 | 0.77 | 1.26 | 88.36 |
| 15 | 0.72 | 1.19 | 89.55 |
| 16 | 0.67 | 1.09 | 90.64 |
| 17 | 0.64 | 1.06 | 91.70 |
| 18 | 0.56 | 0.92 | 92.61 |
| 19 | 0.52 | 0.86 | 93.47 |
| 20 | 0.47 | 0.78 | 94.25 |
| 21 | 0.43 | 0.71 | 94.96 |
| 22 | 0.36 | 0.59 | 95.55 |
| 23 | 0.34 | 0.57 | 96.12 |
| 24 | 0.31 | 0.50 | 96.62 |
| 25 | 0.29 | 0.47 | 97.09 |
| 26 | 0.25 | 0.41 | 97.50 |
| 27 | 0.20 | 0.33 | 97.83 |
| 28 | 0.16 | 0.26 | 98.09 |
| 29 | 0.15 | 0.24 | 98.33 |
| 30 | 0.13 | 0.22 | 98.54 |
| 31 | 0.12 | 0.19 | 98.74 |
| 32 | 0.11 | 0.18 | 98.92 |
| 33 | 0.10 | 0.16 | 99.08 |
| 34 | 0.09 | 0.14 | 99.22 |
| 35 | 0.07 | 0.12 | 99.33 |
| 36 | 0.06 | 0.11 | 99.44 |
| 37 | 0.05 | 0.09 | 99.53 |
| 38 | 0.05 | 0.08 | 99.61 |
| 39 | 0.04 | 0.06 | 99.67 |
| 40 | 0.03 | 0.05 | 99.72 |
| 41 | 0.03 | 0.05 | 99.77 |
| 42 | 0.03 | 0.04 | 99.81 |
| 43 | 0.03 | 0.04 | 99.85 |
| 44 | 0.02 | 0.03 | 99.88 |
| 45 | 0.01 | 0.02 | 99.90 |
| 46 | 0.01 | 0.02 | 99.92 |
| 47 | 0.01 | 0.02 | 99.94 |
| 48 | 0.01 | 0.02 | 99.95 |
| 49 | 0.01 | 0.01 | 99.97 |
| 50 | 0.01 | 0.01 | 99.98 |
| 51 | 0.00 | 0.01 | 99.98 |
| 52 | 0.00 | 0.01 | 99.99 |
| 53 | 0.00 | 0.00 | 99.99 |
| 54 | 0.00 | 0.00 | 100.00 |
| 55 | 0.00 | 0.00 | 100.00 |
| 56 | 0.00 | 0.00 | 100.00 |
| 57 | 0.00 | 0.00 | 100.00 |
| 58 | 0.00 | 0.00 | 100.00 |
| 59 | 0.00 | 0.00 | 100.00 |
| 60 | 0.00 | 0.00 | 100.00 |
| 61 | 0.00 | 0.00 | 100.00 |
| 62 | 21.25 | 34.84 | 34.84 |
| 63 | 8.46 | 13.86 | 48.70 |
| 64 | 6.72 | 11.02 | 59.72 |
| 65 | 4.47 | 7.33 | 67.05 |
| 66 | 2.32 | 3.80 | 70.85 |
| 67 | 1.79 | 2.93 | 73.78 |
| 68 | 1.75 | 2.86 | 76.64 |
| 69 | 1.32 | 2.16 | 78.80 |
| 70 | 1.17 | 1.92 | 80.72 |
| 71 | 1.11 | 1.82 | 82.54 |
| 72 | 1.02 | 1.67 | 84.21 |
| 73 | 0.95 | 1.55 | 85.76 |
| 74 | 0.81 | 1.34 | 87.09 |
| 75 | 0.77 | 1.26 | 88.36 |
| 76 | 0.72 | 1.19 | 89.55 |
| 77 | 0.67 | 1.09 | 90.64 |
| 78 | 0.64 | 1.06 | 91.70 |
| 79 | 0.56 | 0.92 | 92.61 |
| 80 | 0.52 | 0.86 | 93.47 |
| 81 | 0.47 | 0.78 | 94.25 |
| 82 | 0.43 | 0.71 | 94.96 |
| 83 | 0.36 | 0.59 | 95.55 |
| 84 | 0.34 | 0.57 | 96.12 |
| 85 | 0.31 | 0.50 | 96.62 |
| 86 | 0.29 | 0.47 | 97.09 |
| 87 | 0.25 | 0.41 | 97.50 |
| 88 | 0.20 | 0.33 | 97.83 |
| 89 | 0.16 | 0.26 | 98.09 |
| 90 | 0.15 | 0.24 | 98.33 |
| 91 | 0.13 | 0.22 | 98.54 |
| 92 | 0.12 | 0.19 | 98.74 |
| 93 | 0.11 | 0.18 | 98.92 |
| 94 | 0.10 | 0.16 | 99.08 |
| 95 | 0.09 | 0.14 | 99.22 |
| 96 | 0.07 | 0.12 | 99.33 |
| 97 | 0.06 | 0.11 | 99.44 |
| 98 | 0.05 | 0.09 | 99.53 |
| 99 | 0.05 | 0.08 | 99.61 |
| 100 | 0.04 | 0.06 | 99.67 |
| 101 | 0.03 | 0.05 | 99.72 |
| 102 | 0.03 | 0.05 | 99.77 |
| 103 | 0.03 | 0.04 | 99.81 |
| 104 | 0.03 | 0.04 | 99.85 |
| 105 | 0.02 | 0.03 | 99.88 |
| 106 | 0.01 | 0.02 | 99.90 |
| 107 | 0.01 | 0.02 | 99.92 |
| 108 | 0.01 | 0.02 | 99.94 |
| 109 | 0.01 | 0.02 | 99.95 |
| 110 | 0.01 | 0.01 | 99.97 |
| 111 | 0.01 | 0.01 | 99.98 |
| 112 | 0.00 | 0.01 | 99.98 |
| 113 | 0.00 | 0.01 | 99.99 |
| 114 | 0.00 | 0.00 | 99.99 |
| 115 | 0.00 | 0.00 | 100.00 |
| 116 | 0.00 | 0.00 | 100.00 |
| 117 | 0.00 | 0.00 | 100.00 |
| 118 | 0.00 | 0.00 | 100.00 |
| 119 | 0.00 | 0.00 | 100.00 |
| 120 | 0.00 | 0.00 | 100.00 |
| 121 | 0.00 | 0.00 | 100.00 |
| 122 | 0.00 | 0.00 | 100.00 |
| 123 | 21.25 | 34.84 | 34.84 |
| 124 | 8.46 | 13.86 | 48.70 |
| 125 | 6.72 | 11.02 | 59.72 |
| 126 | 4.47 | 7.33 | 67.05 |
| 127 | 2.32 | 3.80 | 70.85 |
| 128 | 1.79 | 2.93 | 73.78 |
| 129 | 1.75 | 2.86 | 76.64 |
| 130 | 1.32 | 2.16 | 78.80 |
| 131 | 1.17 | 1.92 | 80.72 |
| 132 | 1.11 | 1.82 | 82.54 |
| 133 | 1.02 | 1.67 | 84.21 |
| 134 | 0.95 | 1.55 | 85.76 |
| 135 | 0.81 | 1.34 | 87.09 |
| 136 | 0.77 | 1.26 | 88.36 |
| 137 | 0.72 | 1.19 | 89.55 |
| 138 | 0.67 | 1.09 | 90.64 |
| 139 | 0.64 | 1.06 | 91.70 |
| 140 | 0.56 | 0.92 | 92.61 |
| 141 | 0.52 | 0.86 | 93.47 |
| 142 | 0.47 | 0.78 | 94.25 |
| 143 | 0.43 | 0.71 | 94.96 |
| 144 | 0.36 | 0.59 | 95.55 |
| 145 | 0.34 | 0.57 | 96.12 |
| 146 | 0.31 | 0.50 | 96.62 |
| 147 | 0.29 | 0.47 | 97.09 |
| 148 | 0.25 | 0.41 | 97.50 |
| 149 | 0.20 | 0.33 | 97.83 |
| 150 | 0.16 | 0.26 | 98.09 |
| 151 | 0.15 | 0.24 | 98.33 |
| 152 | 0.13 | 0.22 | 98.54 |
| 153 | 0.12 | 0.19 | 98.74 |
| 154 | 0.11 | 0.18 | 98.92 |
| 155 | 0.10 | 0.16 | 99.08 |
| 156 | 0.09 | 0.14 | 99.22 |
| 157 | 0.07 | 0.12 | 99.33 |
| 158 | 0.06 | 0.11 | 99.44 |
| 159 | 0.05 | 0.09 | 99.53 |
| 160 | 0.05 | 0.08 | 99.61 |
| 161 | 0.04 | 0.06 | 99.67 |
| 162 | 0.03 | 0.05 | 99.72 |
| 163 | 0.03 | 0.05 | 99.77 |
| 164 | 0.03 | 0.04 | 99.81 |
| 165 | 0.03 | 0.04 | 99.85 |
| 166 | 0.02 | 0.03 | 99.88 |
| 167 | 0.01 | 0.02 | 99.90 |
| 168 | 0.01 | 0.02 | 99.92 |
| 169 | 0.01 | 0.02 | 99.94 |
| 170 | 0.01 | 0.02 | 99.95 |
| 171 | 0.01 | 0.01 | 99.97 |
| 172 | 0.01 | 0.01 | 99.98 |
| 173 | 0.00 | 0.01 | 99.98 |
| 174 | 0.00 | 0.01 | 99.99 |
| 175 | 0.00 | 0.00 | 99.99 |
| 176 | 0.00 | 0.00 | 100.00 |
| 177 | 0.00 | 0.00 | 100.00 |
| 178 | 0.00 | 0.00 | 100.00 |
| 179 | 0.00 | 0.00 | 100.00 |
| 180 | 0.00 | 0.00 | 100.00 |
| 181 | 0.00 | 0.00 | 100.00 |
| 182 | 0.00 | 0.00 | 100.00 |
| 183 | 0.00 | 0.00 | 100.00 |
| Skladowa 1 | Skladowa 2 | Skladowa 3 | Skladowa 4 | Skladowa 5 | |
|---|---|---|---|---|---|
| Attr1 | 0.89 | 0.39 | -0.04 | 0.01 | -0.03 |
| Attr2 | -0.58 | 0.68 | -0.17 | 0.12 | -0.02 |
| Attr3 | 0.62 | -0.46 | 0.14 | 0.48 | -0.04 |
| Attr4 | 0.58 | -0.60 | 0.16 | 0.23 | -0.04 |
| Attr5 | 0.00 | -0.07 | -0.06 | -0.03 | 0.11 |
| Attr6 | 0.49 | -0.01 | 0.18 | -0.24 | -0.14 |
| Attr7 | 0.89 | 0.40 | -0.05 | 0.02 | -0.04 |
| Attr8 | 0.49 | -0.70 | 0.14 | -0.14 | 0.08 |
| Attr9 | 0.12 | 0.27 | -0.65 | 0.34 | 0.05 |
| Attr10 | 0.59 | -0.65 | 0.17 | -0.10 | 0.03 |
| Attr11 | 0.86 | 0.41 | -0.07 | 0.05 | -0.02 |
| Attr12 | 0.91 | 0.11 | 0.06 | -0.12 | -0.03 |
| Attr13 | 0.77 | 0.17 | 0.36 | -0.24 | 0.06 |
| Attr14 | 0.89 | 0.40 | -0.05 | 0.02 | -0.04 |
| Attr15 | -0.11 | 0.14 | -0.01 | 0.02 | -0.05 |
| Attr16 | 0.90 | -0.04 | 0.05 | -0.14 | 0.01 |
| Attr17 | 0.48 | -0.70 | 0.14 | -0.15 | 0.08 |
| Attr18 | 0.89 | 0.40 | -0.05 | 0.02 | -0.04 |
| Attr19 | 0.86 | 0.32 | 0.26 | -0.11 | 0.03 |
| Attr20 | -0.14 | -0.12 | 0.60 | 0.32 | -0.62 |
| Attr22 | 0.84 | 0.42 | -0.06 | 0.01 | -0.07 |
| Attr23 | 0.85 | 0.32 | 0.25 | -0.12 | 0.03 |
| Attr24 | 0.83 | 0.23 | 0.02 | -0.02 | -0.08 |
| Attr25 | 0.57 | -0.49 | 0.26 | -0.11 | -0.03 |
| Attr26 | 0.89 | -0.06 | 0.06 | -0.15 | 0.02 |
| Attr28 | 0.40 | -0.14 | 0.02 | 0.80 | 0.05 |
| Attr29 | -0.08 | -0.14 | 0.47 | -0.36 | 0.00 |
| Attr30 | -0.57 | 0.34 | 0.47 | -0.14 | 0.03 |
| Attr31 | 0.80 | 0.31 | 0.23 | -0.08 | 0.05 |
| Attr32 | -0.50 | 0.47 | 0.62 | 0.06 | 0.15 |
| Attr33 | 0.48 | -0.50 | -0.58 | -0.08 | -0.12 |
| Attr34 | 0.30 | -0.04 | -0.49 | 0.30 | 0.11 |
| Attr35 | 0.79 | 0.42 | -0.05 | 0.00 | -0.08 |
| Attr36 | 0.10 | 0.29 | -0.81 | 0.24 | -0.08 |
| Attr38 | 0.51 | -0.62 | 0.18 | -0.27 | -0.02 |
| Attr39 | 0.76 | 0.32 | 0.26 | -0.11 | -0.03 |
| Attr40 | 0.48 | -0.38 | 0.05 | -0.01 | 0.19 |
| Attr41 | 0.04 | 0.05 | -0.02 | -0.02 | -0.02 |
| Attr42 | 0.82 | 0.33 | 0.29 | -0.12 | 0.00 |
| Attr43 | -0.16 | -0.07 | 0.81 | 0.44 | -0.02 |
| Attr44 | -0.09 | 0.03 | 0.54 | 0.30 | 0.63 |
| Attr45 | 0.55 | 0.21 | -0.07 | -0.20 | 0.40 |
| Attr46 | 0.57 | -0.53 | 0.10 | 0.14 | 0.34 |
| Attr47 | -0.06 | -0.09 | 0.62 | 0.32 | -0.62 |
| Attr48 | 0.80 | 0.44 | -0.05 | 0.09 | -0.06 |
| Attr49 | 0.76 | 0.42 | 0.13 | 0.03 | -0.04 |
| Attr50 | 0.59 | -0.54 | 0.12 | 0.36 | 0.06 |
| Attr51 | -0.48 | 0.66 | -0.18 | 0.29 | 0.04 |
| Attr52 | -0.50 | 0.46 | 0.63 | 0.06 | 0.15 |
| Attr53 | 0.44 | -0.18 | 0.04 | 0.78 | 0.09 |
| Attr54 | 0.42 | -0.15 | 0.03 | 0.79 | 0.06 |
| Attr55 | 0.25 | -0.30 | 0.17 | 0.02 | 0.02 |
| Attr56 | 0.59 | 0.24 | 0.22 | -0.05 | 0.03 |
| Attr57 | 0.50 | 0.52 | -0.16 | 0.07 | -0.08 |
| Attr58 | -0.62 | -0.24 | -0.24 | 0.06 | -0.03 |
| Attr59 | -0.20 | 0.15 | -0.02 | -0.15 | -0.04 |
| Attr60 | 0.06 | 0.06 | -0.50 | -0.21 | 0.55 |
| Attr61 | 0.11 | 0.00 | -0.44 | -0.25 | -0.55 |
| Attr62 | -0.60 | 0.42 | 0.57 | 0.07 | 0.14 |
| Attr63 | 0.58 | -0.46 | -0.52 | -0.08 | -0.12 |
| Attr64 | 0.09 | 0.24 | -0.31 | 0.76 | 0.04 |
Analizę zaprezentowano dla danych z pierwszego pliku, lecz dla pozostałych plików wyglądają bardzo podobnie.
Poszukiwałem drugiego zbioru danych odnośnie polskich przedsiębiorstw - bankrutów i niebankrutów, lecz nie znalazłem nic sensownego. Drugi zbiór danych jest dla przedsiębiorstw ze Słowacji, dane są podzielone na 4 lata i sektory gospodarki takie jak rolnictwo, budownictwo, przemysł, handel.
Dane pobrane zostały ze strony https://data.mendeley.com/datasets/j89csb932y/2
To 63 wskaźniki finansowe, informacje o bankructwie dla ponad 10 tysięcy przedsiębiorswt w każdym roku.
W zbiorze danych występują braki dość licznie dla poszczególnych zmiennych.
## V1 V2 V3
## Min. :-3004.8300 Min. :-1975000.0 Min. :-81743.75
## 1st Qu.: -0.8925 1st Qu.: 0.0 1st Qu.: -0.28
## Median : 2.0650 Median : 10.8 Median : 1.90
## Mean : 0.7218 Mean : -818.4 Mean : -38.41
## 3rd Qu.: 10.3900 3rd Qu.: 43.1 3rd Qu.: 7.27
## Max. : 2346.5300 Max. : 6124.1 Max. : 1748.54
## NA's :5 NA's :14 NA's :145
## V4 V5 V6 V7
## Min. : -852.430 Min. : -851.43 Min. : -851.430 Min. : -8533.7
## 1st Qu.: 0.050 1st Qu.: 0.50 1st Qu.: 0.850 1st Qu.: 142.8
## Median : 0.250 Median : 0.95 Median : 1.295 Median : 241.7
## Mean : 13.472 Mean : 14.89 Mean : 16.731 Mean : 1864.8
## 3rd Qu.: 0.868 3rd Qu.: 1.92 3rd Qu.: 2.610 3rd Qu.: 463.1
## Max. :25499.000 Max. :25499.00 Max. :25499.000 Max. :889866.2
## NA's :79 NA's :79 NA's :79 NA's :142
## V8 V9 V10 V11
## Min. : -932.39 Min. : -7378.26 Min. : -34.4 Min. :-0.6900
## 1st Qu.: 24.29 1st Qu.: 47.37 1st Qu.: 5.8 1st Qu.: 0.3400
## Median : 56.61 Median : 91.73 Median : 29.5 Median : 0.6400
## Mean : 338.68 Mean : 516.60 Mean : 549.4 Mean : 0.7967
## 3rd Qu.: 108.75 3rd Qu.: 179.40 3rd Qu.: 78.3 3rd Qu.: 0.8800
## Max. :193468.12 Max. :141102.85 Max. :882843.8 Max. :83.1600
## NA's :221 NA's :150 NA's :435 NA's :30
## V12 V13 V14 V15
## Min. : -1506.55 Min. : -1505.55 Min. :-2675.040 Min. : -69.32
## 1st Qu.: 0.25 1st Qu.: 1.28 1st Qu.: 6.795 1st Qu.: 35.60
## Median : 1.15 Median : 2.16 Median : 20.405 Median : 64.88
## Mean : 112.99 Mean : 112.67 Mean : 27.925 Mean : 68.86
## 3rd Qu.: 4.19 3rd Qu.: 5.20 3rd Qu.: 41.947 3rd Qu.: 88.34
## Max. :271726.00 Max. :271727.00 Max. : 2346.530 Max. :2738.74
## NA's :39 NA's :14 NA's :5 NA's :7
## V16 V17 V18 V19
## Min. :-5977.50 Min. :-5290.460 Min. :-274698 Min. :-13.640
## 1st Qu.: 3.44 1st Qu.: 0.608 1st Qu.: 7906 1st Qu.: 0.000
## Median : 16.10 Median : 1.150 Median : 14618 Median : 0.000
## Mean : 83.18 Mean : 0.628 Mean : 21669 Mean : 5.641
## 3rd Qu.: 46.47 3rd Qu.: 2.373 3rd Qu.: 24869 3rd Qu.: 4.790
## Max. :43100.00 Max. : 566.640 Max. : 713965 Max. :183.530
## NA's :70 NA's :473 NA's :1780 NA's :42
## V20 V21 V22
## Min. : -39.690 Min. : -9839.66 Min. :-7946.350
## 1st Qu.: 4.285 1st Qu.: 8.22 1st Qu.: -0.060
## Median : 12.280 Median : 51.75 Median : 2.335
## Mean : 28.767 Mean : 135.25 Mean : -0.161
## 3rd Qu.: 24.650 3rd Qu.: 81.40 3rd Qu.: 10.530
## Max. :21118.750 Max. :178060.62 Max. : 2550.000
## NA's :142 NA's :57 NA's :7
## V23 V24 V25 V26
## Min. :-924266.7 Min. :-38705.37 Min. :-844.000 Min. :-843.000
## 1st Qu.: 0.3 1st Qu.: 0.01 1st Qu.: 0.050 1st Qu.: 0.480
## Median : 10.8 Median : 1.77 Median : 0.210 Median : 0.940
## Mean : -377.7 Mean : -20.45 Mean : 3.687 Mean : 6.840
## 3rd Qu.: 39.3 3rd Qu.: 6.87 3rd Qu.: 0.850 3rd Qu.: 1.925
## Max. : 61974.8 Max. : 1261.67 Max. :2949.670 Max. :4591.570
## NA's :14 NA's :97 NA's :63 NA's :58
## V27 V28 V29 V30
## Min. :-843.000 Min. : -50.1 Min. : -78.3 Min. : -3391.2
## 1st Qu.: 0.870 1st Qu.: 140.4 1st Qu.: 25.4 1st Qu.: 43.6
## Median : 1.300 Median : 231.8 Median : 53.4 Median : 85.4
## Mean : 7.529 Mean : 3196.8 Mean : 560.7 Mean : 1429.9
## 3rd Qu.: 2.572 3rd Qu.: 445.3 3rd Qu.: 98.3 3rd Qu.: 168.5
## Max. :4591.570 Max. :2561227.8 Max. :681721.9 Max. :1482960.6
## NA's :61 NA's :97 NA's :170 NA's :105
## V31 V32 V33 V34
## Min. : -6.62 Min. : -6.2900 Min. :-6841.34 Min. :-7269.17
## 1st Qu.: 6.24 1st Qu.: 0.3500 1st Qu.: 0.28 1st Qu.: 1.30
## Median : 30.93 Median : 0.6400 Median : 1.19 Median : 2.21
## Mean : 155.69 Mean : 0.9428 Mean : 39.08 Mean : 38.17
## 3rd Qu.: 80.99 3rd Qu.: 0.8800 3rd Qu.: 4.13 3rd Qu.: 5.12
## Max. :77371.00 Max. :134.2800 Max. :88584.00 Max. :88585.00
## NA's :404 NA's :26 NA's :33 NA's :14
## V35 V36 V37 V38
## Min. :-6709.380 Min. : -628.93 Min. : -9580.00 Min. :-1457.45
## 1st Qu.: 7.655 1st Qu.: 35.97 1st Qu.: 4.04 1st Qu.: 0.63
## Median : 21.670 Median : 66.09 Median : 16.34 Median : 1.18
## Mean : 29.709 Mean : 76.40 Mean : 111.10 Mean : 21.08
## 3rd Qu.: 44.670 3rd Qu.: 88.60 3rd Qu.: 44.67 3rd Qu.: 2.49
## Max. : 2550.000 Max. :13427.61 Max. :189291.67 Max. :20078.50
## NA's :7 NA's :11 NA's :48 NA's :436
## V39 V40 V41 V42
## Min. :-73059 Min. : -22.380 Min. : -10.31 Min. :-5710.18
## 1st Qu.: 8352 1st Qu.: 0.000 1st Qu.: 4.69 1st Qu.: 18.78
## Median : 15767 Median : 0.000 Median : 12.23 Median : 55.69
## Mean : 21323 Mean : 6.697 Mean : 22.48 Mean : 71.28
## 3rd Qu.: 26070 3rd Qu.: 6.630 3rd Qu.: 24.32 3rd Qu.: 81.63
## Max. :249036 Max. :1483.850 Max. :6574.96 Max. :16890.70
## NA's :1985 NA's :16 NA's :96 NA's :47
## V43 V44 V45
## Min. :-146527.27 Min. :-83175.00 Min. :-15939.680
## 1st Qu.: -0.59 1st Qu.: 0.12 1st Qu.: -0.330
## Median : 1.98 Median : 8.70 Median : 1.450
## Mean : -63.12 Mean : -45.87 Mean : -3.726
## 3rd Qu.: 8.66 3rd Qu.: 34.53 3rd Qu.: 6.605
## Max. : 576.15 Max. : 14222.09 Max. : 6810.280
## NA's :17 NA's :14 NA's :86
## V46 V47 V48 V49
## Min. :-8911.00 Min. :-8911.000 Min. :-8911.000 Min. : -1484.0
## 1st Qu.: 0.05 1st Qu.: 0.480 1st Qu.: 0.850 1st Qu.: 145.7
## Median : 0.22 Median : 0.940 Median : 1.290 Median : 245.9
## Mean : 1.92 Mean : 4.816 Mean : 5.491 Mean : 2797.2
## 3rd Qu.: 0.87 3rd Qu.: 1.980 3rd Qu.: 2.650 3rd Qu.: 478.9
## Max. : 5354.33 Max. : 5354.330 Max. : 5393.670 Max. :3045765.5
## NA's :67 NA's :57 NA's :61 NA's :88
## V50 V51 V52 V53
## Min. : -1444.2 Min. : -19790.6 Min. : -26.04 Min. : -1.840
## 1st Qu.: 26.3 1st Qu.: 43.3 1st Qu.: 11.38 1st Qu.: 0.340
## Median : 54.8 Median : 85.2 Median : 34.22 Median : 0.640
## Mean : 689.2 Mean : 1216.0 Mean : 184.12 Mean : 1.376
## 3rd Qu.: 103.6 3rd Qu.: 177.6 3rd Qu.: 87.30 3rd Qu.: 0.880
## Max. :680895.6 Max. :1046331.9 Max. :148259.72 Max. :1151.040
## NA's :174 NA's :101 NA's :521 NA's :44
## V54 V55 V56 V57
## Min. :-2194.34 Min. :-2193.34 Min. :-146527.27 Min. : -184.34
## 1st Qu.: 0.27 1st Qu.: 1.27 1st Qu.: 6.88 1st Qu.: 35.08
## Median : 1.24 Median : 2.25 Median : 20.68 Median : 65.15
## Mean : 36.97 Mean : 38.34 Mean : -19.25 Mean : 125.01
## 3rd Qu.: 4.04 3rd Qu.: 4.98 3rd Qu.: 43.12 3rd Qu.: 88.64
## Max. :77121.00 Max. :77122.00 Max. : 26220.79 Max. :115103.90
## NA's :41 NA's :14 NA's :17 NA's :20
## V58 V59 V60 V61
## Min. :-2472.730 Min. :-218.040 Min. :-522440 Min. :-11.870
## 1st Qu.: 3.175 1st Qu.: 0.630 1st Qu.: 7910 1st Qu.: 0.000
## Median : 14.380 Median : 1.190 Median : 14747 Median : 0.000
## Mean : 56.867 Mean : 4.796 Mean : 18348 Mean : 6.388
## 3rd Qu.: 40.288 3rd Qu.: 2.590 3rd Qu.: 24163 3rd Qu.: 6.520
## Max. :11295.900 Max. :1288.490 Max. : 747655 Max. :257.630
## NA's :47 NA's :417 NA's :801 NA's :20
## V62 V63 class
## Min. : -20.96 Min. :-23692.80 Min. :0.00000
## 1st Qu.: 4.93 1st Qu.: 21.68 1st Qu.:0.00000
## Median : 12.62 Median : 58.16 Median :0.00000
## Mean : 22.69 Mean : 39.62 Mean :0.02764
## 3rd Qu.: 24.17 3rd Qu.: 83.14 3rd Qu.:0.00000
## Max. :7173.30 Max. : 20746.67 Max. :1.00000
## NA's :85 NA's :55
Postąpiono tak samo jak w przypadku pierwszego zbioru danych. Usunięto zmienne i obserwacje z bardzo dużą ilością braków.
Teraz braków w danych jest mało, około 0,1%. Braki można jeszcze zastąpić średnią grupową.
Dane poddano skalowaniu. Usunięto outliery poprzednio stosowaną funkcją. Braki danych zastąpiono średnią grupową. Zmienną class ustawiono jako factor.
Praktycznie każda zmienna cechuje się dużą zmiennością, wyjątkiem jest V7, V8
## V1 V2 V3 V4 V5 V6 V7 V8
## 4.601 0.073 2.024 -3.426 -1.829 -1.976 -0.309 -0.272
## V9 V11 V12 V13 V14 V15 V16 V19
## -0.346 -1.933 -0.054 -0.054 -111.878 -5.548 -1.881 -27.986
## V20 V21 V22 V23 V24 V25 V26 V27
## -2.096 -0.843 5.501 0.187 0.289 -1.502 -0.329 -0.398
## V28 V29 V30 V32 V33 V34 V35 V36
## -0.090 -0.099 -0.163 -1.781 -0.160 -0.169 -50.329 -5.032
## V37 V40 V41 V42 V43 V44 V45 V46
## -3.152 -20.352 -1.355 -5.266 3.505 0.868 0.939 -1.084
## V47 V48 V49 V50 V51 V53 V54 V55
## -0.355 -0.431 -0.080 -0.077 -0.215 -1.716 -0.146 -0.145
## V56 V57 V58 V61 V62 V63 class
## 52.841 -3.805 -3.059 -30.646 -1.352 -258.875 10.510
W macierzy korelacji nie widać silnej korelacji między żadną ze zmiennych, a bankructwem. Zmienne posegregowane w kolejności malejącej korelacji.
Przedsiębiorstw bankrutów jest mniej niż w przypadku zbioru danych dla Polski.
Z uwagi na tak małą ilość bankrutów w stosunku do “zdrowych” przedsiębiorstw zastanawiam się, czy usuwać obserwacje gdzie brakuje danych, czy w większej ilości zastąpić je średnią.
Dane te pobrano z serwisu https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction
Dotyczą przedsiębiorstw z lat 1999-2009 notowanych na tajwańskiej giełdzie. To ponad 95 wskaźników dla 6819 obserwacji.
Braki danych nie występują.
## ROA(C) before interest and depreciation before interest
## Min. :0.2560
## 1st Qu.:0.4761
## Median :0.5010
## Mean :0.5038
## 3rd Qu.:0.5340
## Max. :0.7753
## ROA(A) before interest and % after tax
## Min. :0.2648
## 1st Qu.:0.5360
## Median :0.5596
## Mean :0.5586
## 3rd Qu.:0.5870
## Max. :0.9847
## ROA(B) before interest and depreciation after tax Operating Gross Margin
## Min. :0.2821 Min. :0.5127
## 1st Qu.:0.5266 1st Qu.:0.6000
## Median :0.5502 Median :0.6064
## Mean :0.5522 Mean :0.6080
## 3rd Qu.:0.5824 3rd Qu.:0.6136
## Max. :0.8103 Max. :0.6652
## Realized Sales Gross Margin Operating Profit Rate Pre-tax net Interest Rate
## Min. :0.5127 Min. :0.9873 Min. :0.7651
## 1st Qu.:0.6001 1st Qu.:0.9990 1st Qu.:0.7974
## Median :0.6064 Median :0.9990 Median :0.7975
## Mean :0.6079 Mean :0.9990 Mean :0.7974
## 3rd Qu.:0.6135 3rd Qu.:0.9991 3rd Qu.:0.7976
## Max. :0.6652 Max. :0.9996 Max. :0.8034
## After-tax net Interest Rate Non-industry income and expenditure/revenue
## Min. :0.7789 Min. :0.2715
## 1st Qu.:0.8093 1st Qu.:0.3035
## Median :0.8094 Median :0.3035
## Mean :0.8093 Mean :0.3035
## 3rd Qu.:0.8095 3rd Qu.:0.3036
## Max. :0.8145 Max. :0.3130
## Continuous interest rate (after tax) Operating Expense Rate
## Min. :0.7488 Min. :0.000e+00
## 1st Qu.:0.7816 1st Qu.:0.000e+00
## Median :0.7816 Median :0.000e+00
## Mean :0.7815 Mean :1.896e+09
## 3rd Qu.:0.7817 3rd Qu.:3.550e+09
## Max. :0.7834 Max. :9.980e+09
## Research and development expense rate Cash flow rate
## Min. :0.00e+00 Min. :0.3466
## 1st Qu.:0.00e+00 1st Qu.:0.4620
## Median :4.41e+08 Median :0.4652
## Mean :1.83e+09 Mean :0.4680
## 3rd Qu.:3.05e+09 3rd Qu.:0.4712
## Max. :9.86e+09 Max. :0.6746
## Interest-bearing debt interest rate Tax rate (A) Net Value Per Share (B)
## Min. :0.00e+00 Min. :0.00000 Min. :0.1284
## 1st Qu.:0.00e+00 1st Qu.:0.00000 1st Qu.:0.1739
## Median :0.00e+00 Median :0.08761 Median :0.1844
## Mean :1.54e+07 Mean :0.12487 Mean :0.1912
## 3rd Qu.:0.00e+00 3rd Qu.:0.22191 3rd Qu.:0.1995
## Max. :9.90e+08 Max. :0.99190 Max. :0.4759
## Net Value Per Share (A) Net Value Per Share (C)
## Min. :0.1284 Min. :0.1284
## 1st Qu.:0.1739 1st Qu.:0.1739
## Median :0.1844 Median :0.1844
## Mean :0.1912 Mean :0.1912
## 3rd Qu.:0.1997 3rd Qu.:0.1997
## Max. :0.4759 Max. :0.4759
## Persistent EPS in the Last Four Seasons Cash Flow Per Share
## Min. :0.1096 Min. :0.2719
## 1st Qu.:0.2145 1st Qu.:0.3184
## Median :0.2235 Median :0.3228
## Mean :0.2277 Mean :0.3242
## 3rd Qu.:0.2384 3rd Qu.:0.3291
## Max. :0.4855 Max. :0.4413
## Revenue Per Share (Yuan ¥) Operating Profit Per Share (Yuan ¥)
## Min. :0.0001966 Min. :0.03835
## 1st Qu.:0.0157297 1st Qu.:0.09616
## Median :0.0286311 Median :0.10419
## Mean :0.0385714 Mean :0.10880
## 3rd Qu.:0.0455821 3rd Qu.:0.11563
## Max. :0.5777183 Max. :0.34468
## Per Share Net profit before tax (Yuan ¥)
## Min. :0.08872
## 1st Qu.:0.17078
## Median :0.17914
## Mean :0.18384
## 3rd Qu.:0.19295
## Max. :0.50908
## Realized Sales Gross Profit Growth Rate Operating Profit Growth Rate
## Min. :0.009889 Min. :0.8175
## 1st Qu.:0.022063 1st Qu.:0.8480
## Median :0.022098 Median :0.8480
## Mean :0.022221 Mean :0.8482
## 3rd Qu.:0.022144 3rd Qu.:0.8481
## Max. :0.081282 Max. :0.9322
## After-tax Net Profit Growth Rate Regular Net Profit Growth Rate
## Min. :0.6209 Min. :0.6198
## 1st Qu.:0.6893 1st Qu.:0.6893
## Median :0.6894 Median :0.6894
## Mean :0.6898 Mean :0.6898
## 3rd Qu.:0.6897 3rd Qu.:0.6896
## Max. :0.8782 Max. :0.8782
## Continuous Net Profit Growth Rate Total Asset Growth Rate
## Min. :0.1820 Min. :0.000e+00
## 1st Qu.:0.2176 1st Qu.:4.788e+09
## Median :0.2176 Median :6.360e+09
## Mean :0.2176 Mean :5.493e+09
## 3rd Qu.:0.2176 3rd Qu.:7.320e+09
## Max. :0.2393 Max. :9.960e+09
## Net Value Growth Rate Total Asset Return Growth Rate Ratio Cash Reinvestment %
## Min. :0.0002136 Min. :0.2590 Min. :0.3061
## 1st Qu.:0.0004420 1st Qu.:0.2638 1st Qu.:0.3755
## Median :0.0004623 Median :0.2640 Median :0.3810
## Mean :0.0005620 Mean :0.2643 Mean :0.3809
## 3rd Qu.:0.0004949 3rd Qu.:0.2644 3rd Qu.:0.3868
## Max. :0.0276540 Max. :0.3586 Max. :0.4751
## Current Ratio Quick Ratio Interest Expense Ratio
## Min. :0.0008761 Min. :0.000e+00 Min. :0.5874
## 1st Qu.:0.0079422 1st Qu.:0.000e+00 1st Qu.:0.6306
## Median :0.0111750 Median :0.000e+00 Median :0.6307
## Mean :0.0161195 Mean :7.683e+06 Mean :0.6309
## 3rd Qu.:0.0172112 3rd Qu.:0.000e+00 3rd Qu.:0.6311
## Max. :0.2668079 Max. :5.240e+09 Max. :0.6774
## Total debt/Total net worth Debt ratio % Net worth/Assets
## Min. :0.000e+00 Min. :0.0005744 Min. :0.7113
## 1st Qu.:0.000e+00 1st Qu.:0.0693382 1st Qu.:0.8515
## Median :0.000e+00 Median :0.1089579 Median :0.8910
## Mean :2.669e+06 Mean :0.1113335 Mean :0.8887
## 3rd Qu.:0.000e+00 3rd Qu.:0.1485473 3rd Qu.:0.9307
## Max. :1.820e+09 Max. :0.2886598 Max. :0.9994
## Long-term fund suitability ratio (A) Borrowing dependency
## Min. :0.004853 Min. :0.3696
## 1st Qu.:0.005258 1st Qu.:0.3701
## Median :0.005621 Median :0.3724
## Mean :0.008926 Mean :0.3751
## 3rd Qu.:0.006825 3rd Qu.:0.3762
## Max. :0.984855 Max. :0.6690
## Contingent liabilities/Net worth Operating profit/Paid-in capital
## Min. :0.005366 Min. :0.03861
## 1st Qu.:0.005366 1st Qu.:0.09617
## Median :0.005366 Median :0.10417
## Mean :0.005855 Mean :0.10873
## 3rd Qu.:0.005820 3rd Qu.:0.11529
## Max. :0.049600 Max. :0.34468
## Net profit before tax/Paid-in capital
## Min. :0.08762
## 1st Qu.:0.16979
## Median :0.17800
## Mean :0.18243
## 3rd Qu.:0.19126
## Max. :0.50852
## Inventory and accounts receivable/Net value Total Asset Turnover
## Min. :0.3937 Min. :0.002998
## 1st Qu.:0.3975 1st Qu.:0.074963
## Median :0.4002 Median :0.121439
## Mean :0.4027 Mean :0.139903
## 3rd Qu.:0.4045 3rd Qu.:0.179535
## Max. :0.5172 Max. :0.676162
## Accounts Receivable Turnover Average Collection Days
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.000e+00
## Mean :2.082e+06 Mean :1.337e+07
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :1.420e+09 Max. :8.370e+09
## Inventory Turnover Rate (times) Fixed Assets Turnover Frequency
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.000e+00
## Mean :2.020e+09 Mean :9.392e+08
## 3rd Qu.:4.125e+09 3rd Qu.:0.000e+00
## Max. :9.940e+09 Max. :9.680e+09
## Net Worth Turnover Rate (times) Revenue per person Operating profit per person
## Min. :0.009194 Min. :0.000141 Min. :0.3073
## 1st Qu.:0.021613 1st Qu.:0.011259 1st Qu.:0.3925
## Median :0.030000 Median :0.019144 Median :0.3955
## Mean :0.037704 Mean :0.034580 Mean :0.4002
## 3rd Qu.:0.043226 3rd Qu.:0.033293 3rd Qu.:0.4020
## Max. :0.327581 Max. :0.473989 Max. :0.8971
## Allocation rate per person Working Capital to Total Assets
## Min. :0.000e+00 Min. :0.6524
## 1st Qu.:0.000e+00 1st Qu.:0.7789
## Median :0.000e+00 Median :0.8137
## Mean :2.563e+07 Mean :0.8172
## 3rd Qu.:0.000e+00 3rd Qu.:0.8551
## Max. :9.570e+09 Max. :1.0000
## Quick Assets/Total Assets Current Assets/Total Assets Cash/Total Assets
## Min. :0.01245 Min. :0.02693 Min. :0.000433
## 1st Qu.:0.25308 1st Qu.:0.36765 1st Qu.:0.034489
## Median :0.38068 Median :0.51120 Median :0.079526
## Mean :0.40481 Mean :0.52628 Mean :0.127139
## 3rd Qu.:0.53725 3rd Qu.:0.68536 3rd Qu.:0.160944
## Max. :0.98894 Max. :0.99545 Max. :0.925018
## Quick Assets/Current Liability Cash/Current Liability
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.000e+00 Median :0.000e+00
## Mean :1.194e+07 Mean :4.826e+07
## 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
## Max. :8.140e+09 Max. :8.590e+09
## Current Liability to Assets Operating Funds to Liability
## Min. :0.003873 Min. :0.0000
## 1st Qu.:0.052810 1st Qu.:0.3419
## Median :0.081007 Median :0.3493
## Mean :0.088429 Mean :0.3554
## 3rd Qu.:0.113723 3rd Qu.:0.3618
## Max. :0.258370 Max. :0.9564
## Inventory/Working Capital Inventory/Current Liability
## Min. :0.2541 Min. :0.000e+00
## 1st Qu.:0.2770 1st Qu.:0.000e+00
## Median :0.2772 Median :0.000e+00
## Mean :0.2774 Mean :2.017e+07
## 3rd Qu.:0.2774 3rd Qu.:0.000e+00
## Max. :0.3424 Max. :6.370e+09
## Current Liabilities/Liability Working Capital/Equity
## Min. :0.1104 Min. :0.6126
## 1st Qu.:0.6260 1st Qu.:0.7339
## Median :0.8017 Median :0.7362
## Mean :0.7602 Mean :0.7357
## 3rd Qu.:0.9376 3rd Qu.:0.7385
## Max. :1.0000 Max. :0.7476
## Current Liabilities/Equity Long-term Liability to Current Assets
## Min. :0.3263 Min. :0.000e+00
## 1st Qu.:0.3281 1st Qu.:0.000e+00
## Median :0.3296 Median :0.000e+00
## Mean :0.3317 Mean :7.241e+07
## 3rd Qu.:0.3321 3rd Qu.:0.000e+00
## Max. :0.5261 Max. :9.310e+09
## Retained Earnings to Total Assets Total income/Total expense
## Min. :0.6840 Min. :0.000772
## 1st Qu.:0.9318 1st Qu.:0.002237
## Median :0.9375 Median :0.002331
## Mean :0.9351 Mean :0.002397
## 3rd Qu.:0.9447 3rd Qu.:0.002489
## Max. :0.9940 Max. :0.017451
## Total expense/Assets Current Asset Turnover Rate Quick Asset Turnover Rate
## Min. :0.00199 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.01436 1st Qu.:0.000e+00 1st Qu.:0.000e+00
## Median :0.02265 Median :0.000e+00 Median :0.000e+00
## Mean :0.02903 Mean :1.273e+09 Mean :2.147e+09
## 3rd Qu.:0.03566 3rd Qu.:0.000e+00 3rd Qu.:5.165e+09
## Max. :0.17126 Max. :9.960e+09 Max. :9.970e+09
## Working capitcal Turnover Rate Cash Turnover Rate Cash Flow to Sales
## Min. :0.5934 Min. :0.000e+00 Min. :0.6675
## 1st Qu.:0.5939 1st Qu.:0.000e+00 1st Qu.:0.6716
## Median :0.5940 Median :1.095e+09 Median :0.6716
## Mean :0.5940 Mean :2.518e+09 Mean :0.6716
## 3rd Qu.:0.5940 3rd Qu.:4.670e+09 3rd Qu.:0.6716
## Max. :0.6054 Max. :1.000e+10 Max. :0.6760
## Fixed Assets to Assets Current Liability to Liability
## Min. :0.0001865 Min. :0.1104
## 1st Qu.:0.0881918 1st Qu.:0.6260
## Median :0.2045842 Median :0.8017
## Mean :0.2494363 Mean :0.7602
## 3rd Qu.:0.3733647 3rd Qu.:0.9376
## Max. :0.8493070 Max. :1.0000
## Current Liability to Equity Equity to Long-term Liability
## Min. :0.3263 Min. :0.1109
## 1st Qu.:0.3281 1st Qu.:0.1109
## Median :0.3296 Median :0.1123
## Mean :0.3317 Mean :0.1160
## 3rd Qu.:0.3321 3rd Qu.:0.1169
## Max. :0.5261 Max. :0.2770
## Cash Flow to Total Assets Cash Flow to Liability CFO to Assets
## Min. :0.4181 Min. :0.0000 Min. :0.2542
## 1st Qu.:0.6316 1st Qu.:0.4569 1st Qu.:0.5697
## Median :0.6454 Median :0.4598 Median :0.5951
## Mean :0.6506 Mean :0.4620 Mean :0.5965
## 3rd Qu.:0.6665 3rd Qu.:0.4649 3rd Qu.:0.6255
## Max. :1.0000 Max. :0.9051 Max. :0.8340
## Cash Flow to Equity Current Liability to Current Assets Liability-Assets Flag
## Min. :0.2662 Min. :0.0008338 Min. :0
## 1st Qu.:0.3127 1st Qu.:0.0170451 1st Qu.:0
## Median :0.3150 Median :0.0261680 Median :0
## Mean :0.3156 Mean :0.0295904 Mean :0
## 3rd Qu.:0.3178 3rd Qu.:0.0365454 3rd Qu.:0
## Max. :0.3566 Max. :0.2571601 Max. :0
## Net Income to Total Assets Total assets to GNP price No-credit Interval
## Min. :0.5745 Min. :0.000e+00 Min. :0.0000
## 1st Qu.:0.7970 1st Qu.:0.000e+00 1st Qu.:0.6236
## Median :0.8104 Median :0.000e+00 Median :0.6239
## Mean :0.8079 Mean :2.199e+07 Mean :0.6231
## 3rd Qu.:0.8258 3rd Qu.:0.000e+00 3rd Qu.:0.6241
## Max. :0.9829 Max. :8.140e+09 Max. :0.9564
## Gross Profit to Sales Net Income to Stockholder's Equity Liability to Equity
## Min. :0.5127 Min. :0.6376 Min. :0.2748
## 1st Qu.:0.6000 1st Qu.:0.8401 1st Qu.:0.2768
## Median :0.6064 Median :0.8411 Median :0.2786
## Mean :0.6080 Mean :0.8401 Mean :0.2807
## 3rd Qu.:0.6136 3rd Qu.:0.8423 3rd Qu.:0.2815
## Max. :0.6651 Max. :0.8496 Max. :0.4843
## Degree of Financial Leverage (DFL)
## Min. :0.004429
## 1st Qu.:0.026791
## Median :0.026808
## Mean :0.026965
## 3rd Qu.:0.026897
## Max. :0.051601
## Interest Coverage Ratio (Interest expense to EBIT) Net Income Flag
## Min. :0.1721 Min. :1
## 1st Qu.:0.5652 1st Qu.:1
## Median :0.5653 Median :1
## Mean :0.5642 Mean :1
## 3rd Qu.:0.5657 3rd Qu.:1
## Max. :0.6325 Max. :1
## Equity to Liability class
## Min. :0.01069 Min. :0.00000
## 1st Qu.:0.02453 1st Qu.:0.00000
## Median :0.03463 Median :0.00000
## Mean :0.05197 Mean :0.02786
## 3rd Qu.:0.05559 3rd Qu.:0.00000
## Max. :0.88102 Max. :1.00000
Skalowanie danych i usunięcie outlierów
Stosunek ilości bankrutów do niebankrutów jest podobny jak w danych dla Polski.
Zmienne posegregowane w kolejności malejącej korelacji. Wartości korelacji są znacznie wyższe niż w przypadku poprzednich danych, jednak wciąż jest to mniej niż 0.5.
Widać jednak silną korelację między zmiennymi, właściwe mogłoby być zastosowanie głównych składowych w celu redukcji wymiarowości.